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Abstract 

The issue of asymmetric uncertainties resulting from fits, nonlinear 
propagation and systematic effects is reviewed. It is shown that, in 
all cases, whenever a published result is given with asymmetric uncer- 
tainties, the value of the physical quantity of interest is biased with 
respect to what would be obtained using at best all experimental and 
theoretical information that contribute to evaluate the combined un- 
certainty. The probabilistic solution to the problem is provided both 
in exact and in approximated forms. 

1 Introduction 

We often see published results in the form 

'best value' ^+ , 

where A + and A_ are usually positive. 1 As firstly pointed out in Ref. [2] and 
discussed in a simpler but more comprehensive way in Ref. [3j , this practice 
is far from being acceptable and, indeed, could bias the believed value of 

1 For examples of measurements having A+ and A_ with all combinations of signs, 
see public online tables of Deep Inelastic Scattering results, I want to make clear since 
the very beginning that it is not my intention to blame experimental or theoretical teams 
which have reported in the past asymmetric uncertainty, because we are all victims of a 
bad tradition in data analysis. At least, when asymmetric uncertainties have been given, 
there is some chance to correct the result, as described in Sec. 2] Since some asym- 
metric contributions to the global uncertainties almost unavoidably happen in complex 
experiments, I am more worried of collaborations that never arrive to final asymmetric 
uncertainties, because I must imagine they have symmetrised somehow the result but, 
I am afraid, without applying the proper shifts to the 'best value' to take into account 
asymmetric contributions, as it will be discussed in the present paper. 
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important physics quantities. The purpose of the present paper is, summa- 
rizing and somewhat completing the work done in the above references, to 
remind where asymmetric uncertainty stem from and to show why, as they 
are usually treated, they bias the value of physical quantities, either in the 
published result itself or in subsequent analyses. Once the problems are 
spotted, the remedy is straightforward, at least within the Bayesian frame- 
work (see e.g. j3], or 0] and [H] for recent reviews). In fact the Bayesian 
approach is conceptually based on the intuitive idea of probability, and for- 
mally grounded on the basic rules of probability (what are usually known 
as the probability 'axioms' and the 'conditional probability definition') plus 
logic. Within this framework many methods of 'conventional' statistics are 
reobtained, as approximations of general solutions, under well stated con- 
ditions of validity. Instead, in the conventional, frequentistic approach ad 
hoc formulae, prescriptions and un-needed principles are used, often without 
understanding what is behind these methods - before a 'principle' there is 
nothing! 

The proposed Bayesian solutions to cure the troubles produced by the 
usual treatment of asymmetric uncertainties is to step up from approximated 
methods to the more general ones (see e.g. Ref. 3 , in particular the top 
down approximation diagram of Fig. 2.2). In this paper we shall see, for 
example, how x 2 an d minus log-likelihood fit 'rules' can be derived from 
the Bayesian inference formulae as approximated methods and what to do 
when the underlying conditions do not hold. We shall encounter a similar 
situation regarding standard formulae to propagate uncertainty. 

Some of the issues addressed here and in Refs. [2| and [3] have been 
recently brought to our attention by Roger Barlow who proposes fre- 
quentistic ways out. Michael Schmelling had also addressed questions re- 
lated to 'asymmetric errors', particularly related to the issue of weighted 
averages (Zj. The reader is encouraged to read also these references to form 
his/her idea about the spotted problems and the proposed solutions. 

In Sec. El the issue of propagation of uncertainty is briefly reviewed 
at an elementary level (just focusing on the sum of uncertain independent 
variables - i.e. no correlations considered) though taking into account asym- 
metry in probability density functions (p.d.f.) of the input quantities. In 
this way we understand what 'might have been done' (we are rarely in the 
positions to exactly know "what has been done") by the authors who pub- 
lish asymmetric results and what is the danger of improper use of such a 
published 'best value' - as is- in subsequent analyses. Then, Sec.|3]we shall 
see in where asymmetric uncertainties stem from and what to do in order 
to overcome their potential troubles. This will be done in an exact way 
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and, whenever is possible, in an approximated way. Some rules of thumb to 
roughly recover sensible probabilistic quantities (expected value and stan- 
dard deviation) from results published with asymmetric uncertainties will 
be given in Sec. |1J Finally, some conclusions will be drawn. 

2 Propagating uncertainty 

Determining the value of a physics quantity is seldom an end in itself. In 
most cases the result is used, together with other experimental and theoret- 
ical quantities, to calculate the value of other quantities of interest. As it is 
well understood, uncertainty on the value of each ingredient is propagated 
into uncertainty on the final result. 

If uncertainty is quantified by probability, as it is commonly done explic- 
itly or implicitly 2 in physics, the propagation of uncertainty is performed 
using rules based on probability theory. If we indicate by X the set ('vec- 
tor') of input quantities and by Y the final quantity, given by the function 
Y = Y(X) of the input quantities, the most general propagation formula 
(see e.g. (Sj) is given by (we stick to continuous variables): 



where f(y) is the p.d.f. of Y, f(x) stands for the joint p.d.f. of X and 5 is the 
Dirac delta (note the use of capital letters to name variables and small letters 
to indicate the values that variables may assume). The exact evaluation of 
Eq. (J2) is often challenging, but, as discussed in Ref. [3], this formula has 
a nice simple interpretation that makes its Monte Carlo implementation 
conceptually easy. 

As it is also well known, often there is no need to go through the an- 
alytic, numerical or Monte Carlo evaluation of Eq.(^Q), since linearization 
of Y(x) around the expected value of X (E[_X"]) makes the calculation of 
expected value and variance of Y very easy, using the well known standard 
propagation formulae, that for uncorrelated input quantities are 



2 Perhaps the reader would be surprised to learn that in the conventional statistical 
approach there is no room for probabilistic statements about the value of physics quantities 
(e.g. "the top mass is between 170 and 180 GeV with such percent probability", or "there 
is 95% probability that the Higgs mass is lighter than 200 GeV"), calibration constants, 
and so on, as discussed extensively in Ref. 0. 





E[Y] 



Y(E[X]) 




3 




As far as the shape of f(y), a Gaussian one is usually assumed, as a result 
of the central limit theorem. Holding this assumptions, E[Y] and o~(Y) is 
all what we need. E[Y] gives the 'best value', and probability intervals, 
upper/lower limits and so on can be easily calculated. In particular, within 
the Gaussian approximation, the most believable value (mode), the barycen- 
ter of the p.d.f. (expected value) and the value that separates two adjacent 
50% probability intervals (median) coincide. If f(y) is asymmetric this is 
not any longer true and one needs then to clarify what 'best value' means, 
which could be one of the above three position parameters, or something 
else (in the Bayesian approach 'best value' stands for expected value, unless 
differently specified). 

Anyhow, Gaussian approximation is not the main issue here and, in most 
real applications, characterized by several contributions to the combined 
uncertainty about Y, this approximation is a reasonable one, even when 
some of the input quantities individually contribute asymmetrically. My 
concerns in this paper are more related to the evaluation of E[Y] and o~(Y) 
when 

1. instead of Eqs. ad hoc propagation prescriptions are used in 
presence of asymmetric uncertainties; 

2. linearization implicit in Eqs. ©-© is not a good approximation. 

Let us start with the first point, considering, as an easy academic example, 
input quantities described by the asymmetric triangular distribution shown 
in the left plot of Fig.^ The value of X can range between —1 and 1, with a 
'best value', in the sense of maximum probability value, of 0.5. The interval 
[—0.16, +0.72] gives a 68.3% probability interval, and the 'result' could be 
reported as X\ = 0.50^Qgg. This is not a problem as long as we known what 
this notation means and, possibly, know the shape of f(x). The problem 
arises when we want to make use of this result and we do not have access to 
f(x) (as it is often the case), or we make improper use of the information 
[i.e. in the case we are aware of f(x)]. Let us assume, for simplicity, to have 
a second independent quantity, X2, described exactly by the same p.d.f. and 
reported in the same way: X2 = 0.50io'gg. Imagine we are now interested 
to the quantity Y = X\ + X2. How to report the result about Y, based on 
the results about Y\ and Y-f 1 - Here are some common, but wrong ways to 
give the result: 
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Figure 1: Distribution of the sum of two independent quantities, each described 
by an asymmetric triangular p.d.f. self-defined in the left plot. The resulting p.d.f. 
(right plot) has been calculated analytically making use of Eq.QJ. This figure 
corresponds to Fig. 4.3 of Ref. 

• asymmetric uncertainties added in quadrature: Y = l.OOl^'j^; 

• asymmetric uncertainties added linearly: Y = 1.00^5' 3 i • 

Indeed, in this simple case we can calculate the integral © analytically, 
obtaining the curve shown in the plot on the right side of Fig. ^ where 
several position and shape parameters have also been reported. The 'best 
value' of Y, meant as expected value (i.e. the barycenter of the p.d.f.) 
comes out to be 0.34. Even those who like to think at the 'best value' as 
the value of maximum probability (density) would choose 0.45 (note that 
in this particular example the mode of the sum is smaller than the mode of 
each addend!). Instead, a 'best value' of Y of 1.00 obtained by the ad hoc 
rules, unfortunately often used in physics, corresponds neither to mode, nor 
to expected value or median. 

The situation would have been much better if expected value and stan- 
dard deviation of X\ and X<i had been reported (respectively 0.17 and 0.42). 
Indeed, these are the quantities that matter in 'error propagation', because 
the theorems upon which propagation formulae rely — exactly in the case Y 
is a linear combination of X , or approximately in the case linearization has 
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been performed — speak of expected values and variances. It is easy to verify 
from the numbers in Fig. ^that exactly the correct values of E[Y] = 0.34 
and o~(Y) = 0.59 would have been obtained. Moreover, one can see that ex- 
pected value, mode and median of f(y) do not differ much from each other, 
and the shape of f(y) resembles a somewhat skewed Gaussian. When Y 
will be combined with other quantities in a next analysis its slightly non- 
Gaussian shape will not matter any longer. Note that we have achieved this 
nice result already with only two input quantities. If we had a few more, 
already Y would have been much Gaussian-like. Instead, performing a bad 
combination of several quantities all skewed in the same side would yield 
'divergent' results 3 : for n = 10 we get, using a quadratic combination of left 
and right deviations, Y = 5.00^2'o7 versus the correct Y = 1.70 ± 1.32. 
As conclusion from this section I would like to make some points: 

• in case of asymmetric uncertainty on a quantity, it should be avoided 
to report only most probable value and a probability interval (be it 
68.3%, 95%, or what else); 

• expected value, meant as barycenter of the distribution, as well as stan- 
dard deviations should always be reported, providing also the shape 
of the distribution (or its summary in terms of shape parameters, or 
even a parameterization of the log-likelihood function in a polynomial 
form, as done e.g. in Ref. :9J, if the distribution is asymmetric or non 
trivial. 

Note that the propagation example shown here is the most elementary pos- 
sible. The situation gets more complicate if also nonlinear propagation is 
involved (see Sec. I3.2|) or when quantities are used in fits (see e.g. Sec. 12.1 
of Ref. 0). 

Hoping that the reader is, at this point, at least worried about the effects 
of badly treated asymmetric uncertainties, let us now review the sources of 
asymmetric uncertainties. 

3 The reader might be curious to know what would happen in case of bad combinations 
of input quantities with skewness of mixed signs. Clearly there will be some compensation 
that lowers the risk of strong bias. As an academic exercise, let think of five independent 
variables each described by the triangular distribution of Fig. and five others each 
described by a p.d.f. which is its mirror reflexed around x = 0.5 [0 < X < 2, mode(X) = 
0.5, E[X] = 0.83 and u(X) = 0.42]. The correct combination of the ten variables gives 
Y — 5.00 ± 1.33, while adding the modes and combining quadratically left and right 
deviations we would get 5.00 ± 1.54. 
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3 Sources of asymmetric uncertainties 

3.1 Non parabolic \ 2 or log-likelihood curves 

The standard methods in physics to adjust theoretical parameters to exper- 
imental data are based on maximum likelihood principle ideas. In practice, 
depending on the situation, the 'minus log-likelihood' of the parameters 
[(/?(#; data) = — lnL(0; data)] or the x 2 function of the parameters [i.e. the 
function x 2 (^> data)] are minimized. The notation used reminds that <p and 
X 2 are seen as mathematical function of the parameters 9, with the data act- 
ing as 'parameters' of the functions. As it is well understood, a part from 
an irrelevant constant non depending on fit parameters, <p and x 2 differ by 
just a factor of two when the likelihood, seen as a joint probability function 
or a p.d.f. of the data, is a (multivariate) Gaussian distribution of the data: 
if = x 2 /2 + A; (the constant k is often neglected, since we concentrate on the 
terms which depend on the fit parameters - but sometimes x 2 an d minus 
log-likelihood might differ by terms depending on fit parameters!). For sake 
of simplicity, let us take one parameter fit. Following the usual practice, we 
indicate the parameter by 8 (though this fit parameter is just any of the 
input quantities X of Sec. EJ- 

If <p(9) or x 2 (P) have a nice parabolic shape, the likelihood is, apart a 
multiplicative factor, a Gaussian function 4 of 9. In fact, as is well known 
from calculus, any function can be approximated to a parabola in the vicin- 
ity of its minimum. Let us see in detail the expansion of <p(0) around its 
minimum 9 m : 



where the second term of the r.h.s. vanishes by definition of minimum and we 
have indicated with a the inverse of the second derivative at the minimum. 
Going back to the likelihood, we get: 



4 But not yet a probability function! The likelihood has the probabilistic meaning of a 
joined p.d.f. of the data given 8, and not the other way around. 





(5) 
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apart a multiplicative factor, this is 'Gaussian' centered in 9 m with standard 
deviation (d 2 <p/d9 2 \o m )~ 2 . However, although this function is mathemati- 
cally a Gaussian, it does not have yet the meaning of probability density 
f(9 | data) in an inferential sense, i.e. describing our knowledge about 9 in 
the light of the experimental data. In order to do this, we need to process 
the likelihood through Bayes theorem, which allows probabilistic inversions 
to be achieved using basic rules of probability theory and logic. Besides a 
conceptually irrelevant normalization factor (that has to be calculated at 
some moment) the Bayes formula is 



f{9 | data) oc /(data | 9) ■ f (9) 



(8) 



We can speak now about the "probability that 9 is within a given interval" 
and calculate it, together with expectation of 9, standard deviation and so 
on. 5 If the prior fo(9) is much vaguer that what the data can teach us (via 
the likelihood), then it can be re-absorbed in the normalization constant, 
and we get: 



f{9 1 data) oc /(data| 



L(9; data) 



i.e 



oc exp [—ip{9\ data)] 



or 



oc exp 



X 2 (Mata) 



parabolic ip or \ 2 : 
-> f(9 | data) 



2ir oq 



exp 



2al 



(9) 
(10) 

(11) 
(12) 



If this is the case, it is a simple exercise to show that 

a) E[#] is equal to 9 m which minimizes the x 2 or l P- 

b) a$ can be obtained by the famous conditions Ax 2 = 1 or Aip = 1/2, 



respectively, or by the second derivative around 9 m : a g 



1/2 x 



(d 2 X 2 /d9 2 



or a e = (d ip/d9 ) L , respectively. 



5 d has not a probabilistic interpretation in the frequentistic approach, and therefore we 
cannot speak consistently, in that framework, about its probability, or determine expec- 
tation, standard deviation and so on. Most physicists do not even know of this problem 
and think these are irrelevant semantic quibbles. However, it is exactly this contradic- 
tion between intuitive thinking and cultural background|Sj that causes wrong scientific 
conclusions, like those discussed in this paper. 
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Though in the frequentistic approach language and methods are usually 
more convoluted (even when the same numerical results of the Bayesian 
reasoning are obtained), due to the fact that probabilistic statements about 
physics quantities and fit parameters are not allowed in that approach, it 
is usually accepted that the above rules a and b are based on the parabolic 
behavior of the minimized functions. When this approximation does not 
hold, the frequentist has to replace a prescription by other prescriptions 
that can handle the exception. 6 The situation is simpler and clearer in the 
Bayesian approach, in which the above rules a and b do hold too, but only 
as approximations under well defined conditions. In case the underlying 
conditions fail we know immediately what to do: 

• restart from Eq. Q or from Eq. depending on the other under- 
lying hypotheses; 

• go even one step before Eq. ©, namely to the most general Eq. (jHJ), if 
priors matter (e.g. physical constraints, sensible previous knowledge, 
etc.). 

For example, if the \ 2 description of the data was a good approximation, 
then f(6) oc e~ x / 2 , properly normalized, is the solution to the problem. 7 
A non parabolic, asymmetric x 2 produces an asymmetric f{9) (see Fig. EJ), 

6 It is a matter of fact that the habit in the particle physics community of applying 
uncritically the Ax = 1 or Atp = 1/2 is related to the use of the software package 
MINUI'1]W\. Indeed, MINUIT can calculate the parameter variances also from the % 2 or 
tp curvature at the minimum (that relies on the same hypothesis upon which the A\ 2 = 1 
or Aip — 1/2 rules are based). But when the x 2 or <p are no longer parabolic, the 
standard deviation calculated from the curvature differs from that of the A% 2 = 1 or 
Aip = 1/2 (in particular, when the minimized function is asymmetric the latter rules give 
two values, the (in-)famous A± we are dealing with). People realize that the curvature at 
the minimum depends from the local behavior of the minimized curve, and the A% 2 = 1 
or Aip =1/2 rule is typically more stable. Therefore, in particle physics the latter rule 
has become de facto a standard to evaluate 'confidence intervals' at different 'levels of 
confidence' (depending of the value of the A^ 2 or Atp). But, unfortunately, when those 
famous curves are not parabolic, numbers obtained by these rules might loose completely 
a probabilistic meaning. [Sorry, a frequentist would object that, indeed, these numbers 
do not have probabilistic meaning about 8, but they are 'confidence intervals' at such and 
such 'confidence level', because l is a constant of unknown value', etc. . . Good luck!] 

7 To be precise, this approximation is valid if the parameters appear only in the argu- 
ment of the exponent. In practice this means that the fitted parameters must not appear 
in the covariance matrix on which the x 2 depends. As a simple example in which this 
approximation do not hold is that of a linear fit in which also the standard deviation a 
describing the errors along the ordinate. The joint inference about line coefficients m and 
c and a, having observed n points, is achieved by f(m, c, a) oc a~ n e~ x ^ 2 (see Sec. 8.2 of 
Ref. 0). 
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Figure 2: Example (Ref. [3]) of asymmetric x 2 curve (left plot) with a \ 2 minimum 
at fj, = 5 (/x stands for the value of a generic physics quantity). The result based 
on the Xmin + 1 'prescription' is compared (plot on the right side) with the p.d.f. 
based on a uniform prior, i.e. /(// | data) oc exp[— x 2 /2]. 

the mode of which corresponds, indeed, to what obtained minimizing x 2 , 
but expected value and standard deviation differ from what is obtained by 
the 'standard rule'. In particular, expected value and variance must be 
evaluated from their definitions: 



Other examples of asymmetric x 2 curves, including the case with more than 
one minimum, are shown in Chapter 12 of Ref. [3], and compared with the 
results coming from frequentist prescriptions (but, indeed, there is not a 
general accepted rule to get frequentistic results - whatever they mean - 
when the x 2 shape gets complicated). 

Unfortunately, it is not easy to translate numbers obtained by ad hoc 
rules into probabilistic results, because the dependence on the actual shape 
of the x 2 °r V 9 curve can be not trivial. Anyhow, some rules of thumb can be 
given in next-to-simple situations where the x 2 or V 9 nas only one minimum 
and the x 2 °r (p curve looks like a 'skewed parabola', like in Fig. |5J 

• the 68% 'confidence interval' obtained by the Ax 2 = 1, or Atp = 1/2 
rule still provides a 68% probability interval for 6. 

• the standard deviation obtained using Eq. (|14|) is approximately equal 
to the average between the A + and A_ values obtained by the Ax 2 = 



E[e) 




(13) 



(14) 
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Figure 3: Example of two-dimensional multi-spots "68% CL" and "95% CL" con- 
tours obtained slicing the \ 2 or the minus log- likelihood curve at some magic levels. 
What do they mean? 



1, or Aip =1/2 rule: 

• the expected value is equal to the mode (9 m , coinciding with the max- 
imum likelihood or minimum \ 2 value) plus a shift: 

E[9] « 9 m + C(A + - A_) . (16) 

[This latter rule is particularly rough because E[0] is more sensitive 
than ag on the exact shape of x 2 or ip curve. Equation (|16[) has to be 
taken only to get an idea of the order of magnitude of the effect. For 
example, in the case depicted in Fig [2] the shift is 80% of (A+ — A_).] 

The remarks about misuse of A% 2 = 1 and Atp = 1/2 rules can be 
extended to cases where several parameters are involved. I do not want 
to go into details (in the Bayesian approach there is nothing deeper than 
studying ke~ x I 2 or ke~ ip in function of several parameters. 8 ), but I just 
want to get the reader worried about the meaning of contour plots of the 
kind shown in Fig. |21 



See footnote 7 concerning a possible pitfall in the use of fee 



-x 2 /2 
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Figure 4: Propagation of a Gaussian distribution under a nonlinear transformation. 
f(Yi) were obtained analytically using Eq.Q (part of Fig. 12.2 of Ref.jH]). 

3.2 Nonlinear propagation 

Another source of asymmetric uncertainties is nonlinear dependence of the 
output quantity Y on some of the input X in a region a few standard 
deviations around E(JT). This problem has been studied with great detail in 
Ref. [21, also taking into account correlations on input and output quantities, 
and somewhat summarized in Ref. [Hj. Let us recall here only the most 
relevant outcomes, in the simplest case of only one output quantity Y and 
neglecting correlations. 

Figure |I] shows a non linear dependence between X and Y and how 
a Gaussian distribution has been distorted by the transformation [f(y) has 
been evaluated analytically using Eq.((T|)]. As a result of the nonlinear trans- 
formation, mode, mean, median and standard deviation are transformed in 
non trivial ways (in the example of Fig. 0] mode moves left and expected 
value right). In the general case the complete calculations should be per- 
formed, either analytically, or numerically or by Monte Carlo. Fortunately, 
as it has been shown in Ref. .2:, second order expansion is often enough 
to take into account small deviations from linearity. The resulting formulae 
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are still compact and depend on location and shape parameters of the initial 
distributions. 

Second order propagation formulae depend on first and second deriva- 
tives. In practical cases (especially as far as the contribution from systematic 
effects are concerned) the derivatives are obtained numerically 9 as 



dY 
dX 

o 2 y 



dx 2 



E[X] 



E[X] 



+ 



A_ 



2 [.a(X) ' a(X) 
1 / A i A_ 



A + + A_ 
2a(X) 



a(X) \a(X) a(X) 



A. 



(17) 
(18) 



where A_ and A+ now stand for the left and right deviations of Y when 
the input variable X varies by one standard deviation around ELY]. Second 
order propagation formulae are conveniently given in Ref. |2] in terms of the 
A± deviations 10 . For Y that depends only on a single input X we get: 



E(Y) « Y(E[X])+5, 

a 2 (Y) « A 2 + 2A-5- S(X) + 5 2 -[/C{X)-1} , 



(21) 
(22) 



where 5 is the semi-difference of the two deviations and A is their average: 



A i 



A = ^+±^ 



(23) 
(24) 



while S(X) and K,(X) stand for skewness and kurtosis of the input vari- 
able. 11 



9 Note that sometimes people do not get asymmetric uncertainty, not because the prop- 
agation is approximately linear, but because asymmetry is hidden by the standard prop- 
agation formula! Therefore also in this case the approximation might produce a bias in 
the result (for example, the second order formula of the expected value of the ratio of two 
quantities is known to experts^5j)- The merit of numerical derivatives is that at least it 
shows the asymmetries. 

10 In terms of analytically calculated derivatives, 5 and A are given by 



6 
A 



i d 2 Y 



2 dX 2 



dY 
dX 



Ei 



E [x] 
a(X). 



a 2 {X) 



(19) 
(20) 



After what we have seen in Sec. [5] we should not forget that the input quantities 
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For many input quantities we have 

E(Y) « Y(E[X])+J2Si, (25) 

i 

<? 2 (Y) « £4, 00, (26) 

i 

where cr^ . (Y) stands for each individual contribution to Eq. (|22j) . The ex- 
pression of the variance gets simplified when all input quantities are Gaus- 
sian (a Gaussian has skewness equal and kurtosis equal 3): 

a\Y) » £A? + 2$>?, (27) 

and, as long as <5? are much smaller that A?, we get the convenient approx- 
imated formulae 

E(Y) « Y(E[X])+$>, (28) 

i 

a 2 (Y) « £A? (29) 

i 

valid also for other symmetric input p.d.f.'s (the kurtosis is about 2 to 3 in 
typical distribution and its exact value is irrelevant if the condition Y^ li $i ^ 
J2i Aj holds). The resulting practical rules (|28 )) -([29 j) are quite simple: 

• the expected value of Y is shifted by the sum of the individual shifts, 
each given by half of the semi-difference of the deviations A± ; 

• each input quantity contributes (in quadrature) to the combined stan- 
dard uncertainty with a term which is approximately the average be- 
tween the deviations A±. 

Moreover, if there are many contributions to the uncertainty, the final uncer- 
tainty will be symmetric and approximately Gaussian, thanks to the central 
limit theorem. 

could have non trivial shapes. Since skewness and kurtosis are related to 3rd and 4th 
moment of the distribution, Eq. 122H makes use up to the 4th moment and is definitely 
better that the usual propagation formula, that uses only second moments. In Ref. [5] 
approximated formulae are given also for skewness and kurtosis of the output variable, 
from which it is possible to reconstruct f(y) taking into account up to 4-th order moment 
of the distribution. 
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3.3 Uncertainty due to systematics 

Finally, and this is often the case that we see in publications, asymmetric 
uncertainty results from systematic effects. The Bayesian approach offers a 
natural and clear way to treat systematics - and I smile at the many at- 
tempts 12 of 'squaring the circle' using frequentistic prescriptions. . . - simply 
because probabilistic concepts are consistently applied to all influence quan- 
tities that can have an effect on the quantity of interest and whose value 
is not precisely known. Therefore we can treat them using probabilistic 
methods. This was also recognized by metrologic organizations 14 . 

Indeed, there is no need to treat systematic effects in a special way. They 
are treated as any of the many input quantities X discussed in Sec. 13.21 and, 
in fact, their asymmetric contributions come frequently from their nonlinear 
influence on the quantity of interest. The only word of caution, on which I 
would like to insist, is to use expected value and standard deviation for each 
systematic effect. In fact, sometimes the uncertainty about the value of the 
influence quantities that contribute to systematics is intrinsically asymmet- 
ric. 

I also would like to comment shortly on results where either of the A± is 
negative, for example LO+qJ (see e.g. Ref. ,F to have an idea of the variety 
of signs of A±). This means that that the we are in proximity of a minimum 
(or a maximum if A + were negative) of the function Y = Y(Xi). It can be 
shown [Him that Eqs. lj2T |l -(p2 |) hold for this case too. 13 

For further details about meaning and treatment of uncertainties due 
systematics and their relations to ISO Type B uncertainties [Tl]. see Refs. j2] 
and 0. 

4 Some rules of thumb to unfold probabilistic sen- 
sible information from results published with 
asymmetric uncertainties 

Having understood what one should have done to obtain expected value and 
standard deviation in the situations in which people are used to report asym- 

12 It has been studied by psychologists how sometimes our efforts to solve a problem are 
the analogous with the moves along elements of a group structure (in the mathematical 
sense) . There is no way to reach a solution until we not break out of this kind of trapping 
psychological or cultural cages. [T3] 

13 In this special case there should be no doubt that a shift should be applied to the best 
value, since moving Xi by ±a(Xi) around its expected value E[Xi] the final quantity Y 
only moves in one side of Y(E[Xi]). 
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metric uncertainties, we might attempt to recover those quantities from the 
published result. It is possible to do it exactly only if we know the detailed 
contributions to the uncertainty, namely the x 2 or log-likelihood functions 
of the so called 'statistical terms' and the pairs {A +i , A^}, together to the 
probabilistic model, for each 'systematic term'. However, these pieces of 
information are usually unavailable. But we can still make some guesses, 
based on some rough assumptions, lacking other information: 

• asymmetric uncertainties in the 'statistical part' are due to asymmetric 
X 2 or log-likelihood: — ► apply corrections given by Eqs. (|15j) - (fTK|) : 

• asymmetric uncertainties in the 'systematic part' comes from nonlinear 
propagation: — > apply corrections given by Eqs. (|28))-(j2T 



As a numerical example, imagine we read the following result (in arbitrary 
units): 

Y = 6.0±ig±g5, (30) 

(that somebody would summary as 6.0^2 2')- The only certainty we have, 
seeing two asymmetric uncertainties with the same sign of skewness, is that 
the result is definitively biased. Let us try to make our estimate of the bias 
and calculate the corrected result (that, not withstanding all uncertainties 
about uncertainties, will be closer to the 'truth' than the published one): 

1. the first contribution gives roughly [see. Eqs. ()15j) -(|16 |) ]: 

«5i ps -1.0 (31) 
0-1 ps 1.5; (32) 

2. for the second contribution we have [see. Eqs. (|2" H) -(|2"i |l . (|2"8 |) -(|2"9 |) ]: 

5 2 ps -0.31 (33) 
u 2 ps 0.62. (34) 

Our guessed best result would then become 14 

Y ps 4.69 ±1.5 ±0.62 = 4.69 ± 1.62 (35) 

ps 4.7 ±1.6. (36) 

14 The ISO Guide |14) recommends to give the result using the standard deviation within 
parenthesis, instead of using the ±xx notation. In this example we would have Y Ri 
4.69 (1.5) (0.62) = 4.69 (1.62) => Y ~ 4.7 (1.6). Personally, I do not think this is a very 
important issue as long as we know what the quantity xx means. Anyhow, I understand 
the ISO rational, and perhaps the proposed notation could help to make a break with the 
'confidence intervals'. 
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Figure 5: Monte Carlo estimate of the shape of the p.d.f. of the sum of three 
independent variables, one described by the p.d.f. of Fig. [21 and the other two by 
the triangular distribution of Fig. ^ 

(The exceeding number of digits in the intermediate steps are just to make 
numerical comparison with the correct result that will be given in a while.) 

If we had the chance to learn that the result of Eq. (J3(J|) was due to the 
asymmetric x 2 n t °f Fig. [2] plus two systematic corrections, each described 
by the triangular distribution of Fig. ^ then we could calculate expectation 
and variance exactly: 

E(Y) = 4.2 + 2 x 0.17 = 4.54 (37) 
a 2 (Y) = 1.5 2 + 2 x 0.42 2 = 1.61 2 , (38) 

i.e. Y = 4.54± 1.61, quite different from Eq. (|30j) and close to the result cor- 
rected by rule of thumb formulae. Indeed, knowing exactly the ingredients, 
we can evaluate f(y) from Eq.Q as 

f(y) = J $(y - xx - x 2 - x 3 ) fi{xi) f 2 {x 2 ) / 3 (x 3 )dxidx2d3;3 , (39) 

although by Monte Carlo. The result is given in Fig. |SJ from which we can 
evaluate a mean value of 4.54 and a standard deviation of 1.65 in perfect 
agreement with the figures given in Eqs. (j37|) - ()38j) . 15 As we can see from 

15 The slight difference between the standard deviations comes from rounding, since 
cr(/i) = 1.5 of Fig. |5|is the rounded value of 1.54. Replacing 1.5 by 1.54 in Eq. 138H . we 
get exactly the Monte Carlo value of 1.65. 
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the figure, also those who like to think at 'best value' in term of most 
probable value have to realize once more that the most probable value of 
a sum is not necessarily equal to the sum of most probable values of the 
addends (and analogous statements for all combinations of uncertainties 16 ). 
In the distribution of Fig. O the mode of the distribution is around 5. [Note 
that expected value and variance are equal to those given by Eqs. ()37|) - 
(|381 since in the case of a linear combination they can be obtained exactly] 
Other statistical quantities that can be extracted by the distribution are the 
median, equal to 4.67, and some 'quantiles' (values at which the cumulative 
distribution reaches a given percent of the maximum - the median being the 
50% quantile). Interesting quantiles are the 15.85%, 25%, 75% and 84.15%, 
for which the Monte Carlo gives the following values of Y: 2.88, 3.49, 5.72 
and 6.18. Prom these values we can calculate the central 50% and 68.3% 
intervals, 17 which are [3.49, 5.72] and [2.88, 6.18], respectively. Again, the 
information provided by Eq. (|30|) is far from any reasonable way to provide 
the uncertainty about Y, given the information on each component. 

16 Discussing this issues with several persons I have realized, with my great surprise, that 
this misconception is deeply rooted and strenuously defended by many colleagues, even 
by data analysis experts (they constantly reply "yes, but. . . "). This attitude is probably 
one of the consequences of being anchored to what I call un-needed principles (namely 
maximum likelihood, in this case), such that even the digits resulting from these principles 
are taken with a kind of religious respect and it seems blasphemous to touch them. 

17 I give the central 68.3% interval with some reluctance, because I know by experience 
that in many minds the short circuit 

"68% probability interval" < > "sigma" 

is almost unavoidable (I have known physicists convinced - and who even taught it! - that 
the standard deviation only 'makes sense for the Gaussian' and that it was defined via the 
'68% rule'). For this reason, recently I have started to appreciate thinking in terms of 50% 
probability intervals, also because they force people to reason in terms of better perceived 
fifty-to-fifty bets. I find these kind of bets very enlighting to show why practically all 
standard ways (including Bayesian ones!) fail to report upper/lower confidence limits in 
frontier case situations characterized by open likelihoods (see chapter 12 in Ref.|3]). I like 
to ask "please use your method and give me a 50% C.L. upper/lower limit", and then, 
when I have got it, "are you really 50% confident that the value is below that limit and 
50% confident that it is above it? Would you equally bet on either side of that limit?". 
And the supporters of 'objective' methods are immediately at loss. (At least those who 
use Bayesian formulae realize that there must be some problem with the choice of priors.) 
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Besides the lucky case 18 of this numerical example (which was not con- 
structed on purpose, but just recycling some material from Ref. 3.), it seems 
reasonable that even results roughly corrected by rule of thumb formulae are 
already better than those published directly with asymmetric result. 19 But 
the accurate analysis can only be done by the authors who know the details 
of the individual contribution to the uncertainty. 

5 Conclusions 

Asymmetric uncertainties do exist and there is no way to remove them ar- 
tificially. If they are not properly treated, i.e. using prescriptions that do 
not have a theoretical ground but are more or less rooted in the physics 
community, the published result is biased. Instead, if they are properly 
treated using probability theory, in most cases of interest the final result is 
practically symmetric and approximately Gaussian, with expected value and 
standard deviations which take into account the several shifts due to indi- 
vidual asymmetric contributions. Note that some of the simplified methods 
to make statistical analyses had a raison d'etre many years ago, when the 
computation was a serious limitation. Now it is not any longer a problem to 
evaluate, analytically or numerically, integrals of the kind of those appearing 
e.g. in Eqs.JU), (TT3I) and (till). 

In the case the final uncertainty remains asymmetric, the authors should 
provide detailed information about the 'shape of the uncertainty', giving also 
most probable value, probability intervals, and so on. But the best estimate 
of the expected value and standard deviation should be always given (see also 
the ISO Guide [HI). 

To conclude, I would like to leave the final word to my preferred quo- 
tation with whom I like to end seminars and courses on probability theory 
applied to the evaluation and the expression of uncertainty in measurements: 

18 In the example here we have been lucky because an over-correction of the first contri- 
bution was compensated by an under-correction of the second contribution. Note also that 
the hypothesis about the nonlinear propagation was not correct, because we had, instead, 
a linear propagation of asymmetric p.d.f.'s. Anyhow the overall shift calculated by the 
guessed hypothesis is comparable to that calculable knowing the details of the analysis 
(and, in any case, using in subsequent analyses the roughly corrected result is definitely 
better than sticking to the published 'best value'). 

19 Note that even if we were told that Y was 6.OI2 Hi without further information, we 
could still try to apply some shift to the result, obtaining 4.8 ± 1.6 or 5.4 ± 1.6 depending 
on some guesses about the source of the asymmetry. In any case, either results are better 
than 6.O+2.2! 
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"Although this Guide provides a framework for assessing uncertainty, 
it cannot substitute for critical thinking, intellectual honesty, and pro- 
fessional skill. The evaluation of uncertainty is neither a routine task 
nor a purely mathematical one; it depends on detailed knowledge of 
the nature of the measurand and of the measurement. The quality and 
utility of the uncertainty quoted for the result of a measurement there- 
fore ultimately depend on the understanding, critical analysis, and 
integrity of those who contribute to the assignment of its value. " [Ti| 



It is a pleasure to thank Superfaber (Fabrizio Fabbri in hepnames) for helpful 
discussions on the subject and for his supervision of the manuscript. 
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